{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_02_3_pandas_grouping.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# T81-558: Applications of Deep Neural Networks\n",
    "**Module 2: Python for Machine Learning**\n",
    "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
    "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Module 2 Material\n",
    "\n",
    "Main video lecture:\n",
    "\n",
    "* Part 2.1: Introduction to Pandas [[Video]](https://www.youtube.com/watch?v=bN4UuCBdpZc&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_02_1_python_pandas.ipynb)\n",
    "* Part 2.2: Categorical Values [[Video]](https://www.youtube.com/watch?v=4a1odDpG0Ho&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_02_2_pandas_cat.ipynb)\n",
    "* **Part 2.3: Grouping, Sorting, and Shuffling in Python Pandas** [[Video]](https://www.youtube.com/watch?v=YS4wm5gD8DM&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_02_3_pandas_grouping.ipynb)\n",
    "* Part 2.4: Using Apply and Map in Pandas for Keras [[Video]](https://www.youtube.com/watch?v=XNCEZ4WaPBY&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_02_4_pandas_functional.ipynb)\n",
    "* Part 2.5: Feature Engineering in Pandas for Deep Learning in Keras [[Video]](https://www.youtube.com/watch?v=BWPTj4_Mi9E&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_02_5_pandas_features.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Google CoLab Instructions\n",
    "\n",
    "The following code ensures that Google CoLab is running the correct version of TensorFlow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: not using Google CoLab\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    %tensorflow_version 2.x\n",
    "    COLAB = True\n",
    "    print(\"Note: using Google CoLab\")\n",
    "except:\n",
    "    print(\"Note: not using Google CoLab\")\n",
    "    COLAB = False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 2.3: Grouping, Sorting, and Shuffling  \n",
    "\n",
    "We will take a look at a few ways to affect an entire Pandas data frame. These techniques will allow us to group, sort, and shuffle data sets. These are all essential operations for both data preprocessing and evaluation.\n",
    "\n",
    "## Shuffling a Dataset\n",
    "There may be information lurking in the order of the rows of your dataset. Unless you are dealing with time-series data, the order of the rows should not be significant. Consider if your training set included employees in a company. Perhaps this dataset is ordered by the number of years the employees were with the company. It is okay to have an individual column that specifies years of service. However, having the data in this order might be problematic.  \n",
    "\n",
    "Consider if you were to split the data into training and validation. You could end up with your validation set having only the newer employees and the training set longer-term employees. Separating the data into a k-fold cross validation could have similar problems. Because of these issues, it is important to shuffle the data set.\n",
    "\n",
    "Often shuffling and reindexing are both performed together. Shuffling randomizes the order of the data set. However, it does not change the Pandas row numbers. The following code demonstrates a reshuffle. Notice that the program has not reset the row indexes' first column. Generally, this will not cause any issues and allows tracing back to the original order of the data. However, I usually prefer to reset this index. I reason that I typically do not care about the initial position, and there are a few instances where this unordered index can cause issues."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mpg</th>\n",
       "      <th>cylinders</th>\n",
       "      <th>displacement</th>\n",
       "      <th>...</th>\n",
       "      <th>year</th>\n",
       "      <th>origin</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>117</th>\n",
       "      <td>29.0</td>\n",
       "      <td>4</td>\n",
       "      <td>68.0</td>\n",
       "      <td>...</td>\n",
       "      <td>73</td>\n",
       "      <td>2</td>\n",
       "      <td>fiat 128</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>245</th>\n",
       "      <td>36.1</td>\n",
       "      <td>4</td>\n",
       "      <td>98.0</td>\n",
       "      <td>...</td>\n",
       "      <td>78</td>\n",
       "      <td>1</td>\n",
       "      <td>ford fiesta</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88</th>\n",
       "      <td>14.0</td>\n",
       "      <td>8</td>\n",
       "      <td>302.0</td>\n",
       "      <td>...</td>\n",
       "      <td>73</td>\n",
       "      <td>1</td>\n",
       "      <td>ford gran torino</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>10.0</td>\n",
       "      <td>8</td>\n",
       "      <td>307.0</td>\n",
       "      <td>...</td>\n",
       "      <td>70</td>\n",
       "      <td>1</td>\n",
       "      <td>chevy c20</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>398 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      mpg  cylinders  displacement  ...  year  origin              name\n",
       "117  29.0          4          68.0  ...    73       2          fiat 128\n",
       "245  36.1          4          98.0  ...    78       1       ford fiesta\n",
       "..    ...        ...           ...  ...   ...     ...               ...\n",
       "88   14.0          8         302.0  ...    73       1  ford gran torino\n",
       "26   10.0          8         307.0  ...    70       1         chevy c20\n",
       "\n",
       "[398 rows x 9 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import os\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "df = pd.read_csv(\n",
    "    \"https://data.heatonresearch.com/data/t81-558/auto-mpg.csv\", \n",
    "    na_values=['NA', '?'])\n",
    "\n",
    "#np.random.seed(42) # Uncomment this line to get the same shuffle each time\n",
    "df = df.reindex(np.random.permutation(df.index))\n",
    "\n",
    "pd.set_option('display.max_columns', 7)\n",
    "pd.set_option('display.max_rows', 5)\n",
    "display(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following code demonstrates a reindex.  Notice how the reindex orders the row indexes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mpg</th>\n",
       "      <th>cylinders</th>\n",
       "      <th>displacement</th>\n",
       "      <th>...</th>\n",
       "      <th>year</th>\n",
       "      <th>origin</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>29.0</td>\n",
       "      <td>4</td>\n",
       "      <td>68.0</td>\n",
       "      <td>...</td>\n",
       "      <td>73</td>\n",
       "      <td>2</td>\n",
       "      <td>fiat 128</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>36.1</td>\n",
       "      <td>4</td>\n",
       "      <td>98.0</td>\n",
       "      <td>...</td>\n",
       "      <td>78</td>\n",
       "      <td>1</td>\n",
       "      <td>ford fiesta</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>396</th>\n",
       "      <td>14.0</td>\n",
       "      <td>8</td>\n",
       "      <td>302.0</td>\n",
       "      <td>...</td>\n",
       "      <td>73</td>\n",
       "      <td>1</td>\n",
       "      <td>ford gran torino</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>397</th>\n",
       "      <td>10.0</td>\n",
       "      <td>8</td>\n",
       "      <td>307.0</td>\n",
       "      <td>...</td>\n",
       "      <td>70</td>\n",
       "      <td>1</td>\n",
       "      <td>chevy c20</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>398 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      mpg  cylinders  displacement  ...  year  origin              name\n",
       "0    29.0          4          68.0  ...    73       2          fiat 128\n",
       "1    36.1          4          98.0  ...    78       1       ford fiesta\n",
       "..    ...        ...           ...  ...   ...     ...               ...\n",
       "396  14.0          8         302.0  ...    73       1  ford gran torino\n",
       "397  10.0          8         307.0  ...    70       1         chevy c20\n",
       "\n",
       "[398 rows x 9 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "pd.set_option('display.max_columns', 7)\n",
    "pd.set_option('display.max_rows', 5)\n",
    "\n",
    "df.reset_index(inplace=True, drop=True)\n",
    "display(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sorting a Data Set\n",
    "\n",
    "While it is always good to shuffle a data set before training, during training and preprocessing, you may also wish to sort the data set. Sorting the data set allows you to order the rows in either ascending or descending order for one or more columns. The following code sorts the MPG dataset by name and displays the first car."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The first car is: amc ambassador brougham\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mpg</th>\n",
       "      <th>cylinders</th>\n",
       "      <th>displacement</th>\n",
       "      <th>...</th>\n",
       "      <th>year</th>\n",
       "      <th>origin</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>96</th>\n",
       "      <td>13.0</td>\n",
       "      <td>8</td>\n",
       "      <td>360.0</td>\n",
       "      <td>...</td>\n",
       "      <td>73</td>\n",
       "      <td>1</td>\n",
       "      <td>amc ambassador brougham</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>15.0</td>\n",
       "      <td>8</td>\n",
       "      <td>390.0</td>\n",
       "      <td>...</td>\n",
       "      <td>70</td>\n",
       "      <td>1</td>\n",
       "      <td>amc ambassador dpl</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>325</th>\n",
       "      <td>44.3</td>\n",
       "      <td>4</td>\n",
       "      <td>90.0</td>\n",
       "      <td>...</td>\n",
       "      <td>80</td>\n",
       "      <td>2</td>\n",
       "      <td>vw rabbit c (diesel)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>293</th>\n",
       "      <td>31.9</td>\n",
       "      <td>4</td>\n",
       "      <td>89.0</td>\n",
       "      <td>...</td>\n",
       "      <td>79</td>\n",
       "      <td>2</td>\n",
       "      <td>vw rabbit custom</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>398 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      mpg  cylinders  displacement  ...  year  origin                     name\n",
       "96   13.0          8         360.0  ...    73       1  amc ambassador brougham\n",
       "9    15.0          8         390.0  ...    70       1       amc ambassador dpl\n",
       "..    ...        ...           ...  ...   ...     ...                      ...\n",
       "325  44.3          4          90.0  ...    80       2     vw rabbit c (diesel)\n",
       "293  31.9          4          89.0  ...    79       2         vw rabbit custom\n",
       "\n",
       "[398 rows x 9 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import os\n",
    "import pandas as pd\n",
    "\n",
    "df = pd.read_csv(\n",
    "    \"https://data.heatonresearch.com/data/t81-558/auto-mpg.csv\", \n",
    "    na_values=['NA', '?'])\n",
    "\n",
    "df = df.sort_values(by='name', ascending=True)\n",
    "print(f\"The first car is: {df['name'].iloc[0]}\")\n",
    "      \n",
    "pd.set_option('display.max_columns', 7)\n",
    "pd.set_option('display.max_rows', 5)\n",
    "display(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Grouping a Data Set\n",
    "\n",
    "Grouping is a typical operation on data sets.  Structured Query Language (SQL) calls this operation a \"GROUP BY.\"  Programmers use grouping to summarize data.  Because of this, the summarization row count will usually shrink, and you cannot undo the grouping.  Because of this loss of information, it is essential to keep your original data before the grouping. \n",
    "\n",
    "We use the Auto MPG dataset to demonstrate grouping."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mpg</th>\n",
       "      <th>cylinders</th>\n",
       "      <th>displacement</th>\n",
       "      <th>...</th>\n",
       "      <th>year</th>\n",
       "      <th>origin</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>18.0</td>\n",
       "      <td>8</td>\n",
       "      <td>307.0</td>\n",
       "      <td>...</td>\n",
       "      <td>70</td>\n",
       "      <td>1</td>\n",
       "      <td>chevrolet chevelle malibu</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>15.0</td>\n",
       "      <td>8</td>\n",
       "      <td>350.0</td>\n",
       "      <td>...</td>\n",
       "      <td>70</td>\n",
       "      <td>1</td>\n",
       "      <td>buick skylark 320</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>396</th>\n",
       "      <td>28.0</td>\n",
       "      <td>4</td>\n",
       "      <td>120.0</td>\n",
       "      <td>...</td>\n",
       "      <td>82</td>\n",
       "      <td>1</td>\n",
       "      <td>ford ranger</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>397</th>\n",
       "      <td>31.0</td>\n",
       "      <td>4</td>\n",
       "      <td>119.0</td>\n",
       "      <td>...</td>\n",
       "      <td>82</td>\n",
       "      <td>1</td>\n",
       "      <td>chevy s-10</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>398 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      mpg  cylinders  displacement  ...  year  origin  \\\n",
       "0    18.0          8         307.0  ...    70       1   \n",
       "1    15.0          8         350.0  ...    70       1   \n",
       "..    ...        ...           ...  ...   ...     ...   \n",
       "396  28.0          4         120.0  ...    82       1   \n",
       "397  31.0          4         119.0  ...    82       1   \n",
       "\n",
       "                          name  \n",
       "0    chevrolet chevelle malibu  \n",
       "1            buick skylark 320  \n",
       "..                         ...  \n",
       "396                ford ranger  \n",
       "397                 chevy s-10  \n",
       "\n",
       "[398 rows x 9 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import os\n",
    "import pandas as pd\n",
    "\n",
    "df = pd.read_csv(\n",
    "    \"https://data.heatonresearch.com/data/t81-558/auto-mpg.csv\", \n",
    "    na_values=['NA', '?'])\n",
    "\n",
    "pd.set_option('display.max_columns', 7)\n",
    "pd.set_option('display.max_rows', 5)\n",
    "display(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can use the above data set with the group to perform summaries.  For example, the following code will group cylinders by the average (mean).  This code will provide the grouping.  In addition to **mean**, you can use other aggregating functions, such as **sum** or **count**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "cylinders\n",
       "3    20.550000\n",
       "4    29.286765\n",
       "5    27.366667\n",
       "6    19.985714\n",
       "8    14.963107\n",
       "Name: mpg, dtype: float64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = df.groupby('cylinders')['mpg'].mean()\n",
    "g"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It might be useful to have these **mean** values as a dictionary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{3: 20.55,\n",
       " 4: 29.28676470588236,\n",
       " 5: 27.366666666666664,\n",
       " 6: 19.985714285714284,\n",
       " 8: 14.963106796116508}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d = g.to_dict()\n",
    "d"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A dictionary allows you to access an individual element quickly.  For example, you could quickly look up the mean for six-cylinder cars.  You will see that target encoding, introduced later in this module, uses this technique. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "19.985714285714284"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d[6]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The code below shows how to count the number of rows that match each cylinder count."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{3: 4, 4: 204, 5: 3, 6: 84, 8: 103}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('cylinders')['mpg'].count().to_dict()"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3.9 (tensorflow)",
   "language": "python",
   "name": "tensorflow"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
